Santa Fe Institute Working Paper 10-11-029 
arxiv.org:arXiv:1011.1581 [nlin.CD] 

Asymptotic Synchronization for Finite-State Sources 

Nicholas F. Travers 1 ' 2 <0 and James P. Crutchfield 1 ' 2 ' 3 ' 4 '0 

1 Complexity Sciences Center 
2 Mathematics Department 
''Physics Department 
University of California at Davis, 
One Shields Avenue, Davis, CA 95616 
* Santa Fe Institute 
1399 Hyde Park Road, Santa Fe, NM 87501 
(Dated: January 5, 2011) 

We extend a recent synchronization analysis of exact finite-state sources to nonexact sources for 
which synchronization occurs only asymptotically. Although the proof methods are quite differ- 
ent, the primary results remain the same. We find that an observer's average uncertainty in the 
source state vanishes exponentially fast and, as a consequence, an observer's average uncertainty in 
predicting future output converges exponentially fast to the source entropy rate. 
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I. INTRODUCTION 



II. BACKGROUND 



In Ref. pQ we analyzed the synchronization process for 
exact e-machines, where the observer may come to know 
the internal state of the machine with certainty after 
only a finite number of measurements. Here, we exam- 
ine the case of nonexact e-machines, where the observer 
may only synchronize to the machine's state asymptoti- 
cally. Although the analysis differs, the behavior is qual- 
itatively similar to the exact case in the sense that an 
observer (on average) synchronizes to a nonexact ma- 
chine exponentially fast. That is, there exist constants 
K > and < a < 1 such that the average state entropy 
U{L) < Ka L , for all LeN. 

Our development is organized as follows. Section [T^ 
briefly reviews the synchronization problem and pro- 
vides the essential definitions for our results. Section 
|III| presents an intuitive picture of the synchronization 
process, using it to derive a formula for (f>(w), the condi- 
tional state distribution induced by a word w. Section IV 
establishes a formula for the entropy rate of a finite-state 
e-machine. Section [V] uses the entropy-rate formula to 
prove the existence of averaged asymptotic synchroniza- 



tion. Section VI builds on this result to prove our main 
theorem — the Nonexact Machine Synchronization The- 
orem. Section |VII| uses this theorem to show that, for 
any nonexact e-machine, the state entropy U(T) vanishes 
exponentially fast and the length-!/ entropy-rate approx- 
imation h^{L) converges exponentially fast to the ma- 
chine's entropy rate. Finally, Sec. |VIII| summarizes our 
results and examines possible extensions. 
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This section lays out the necessary definitions and 
background for our results. For a more thorough intro- 
duction the reader is referred to Ref. |T] , where a similar 
but more detailed presentation is given. 

A. Machines 

Definition 1. Hidden Markov machine: A finite-state 
edge-label hidden Markov machine (HMM) consists of 

1. a finite set of states S — {o~i, ...,ctjv}, 

2. a finite alphabet of symbols A, and 

3. a set of N by N symbol-labeled transition matri- 
ces T( x ) , x G A, where 7y is the probability of 
transitioning from state o~i to state o~j on symbol x. 
The corresponding internal state-to- state transition 
matrix is denoted T = Y^xeA • 

A hidden Markov machine can be depicted as a di- 
rected graph with labeled edges. The nodes are the states 

{o"i, ...,(7jv"} and for all x,i,j with > ; there is an 
edge from state o~i to state o~a labeled p\x for the symbol x 

(x) 

and transition probability p = . We require that the 
transition matrices T^ x ' be such that this graph is strongly 
connected. 

A hidden Markov machine Al generates a stationary 
process V = (Xl)l>o as follows. Initially, M starts in 
some state o> chosen according to the stationary distri- 
bution 7T over machine states — the distribution satisfying 
7rT = it. It then picks an outgoing edge according to their 

(x) 

relative transition probabilities 2j« • , emits the symbol x* 
labeling this edge, and follows the edge to a new state <7j» . 
The next output symbol and state are consequently cho- 
sen in a similar fashion, and this procedure is repeated 
indefinitely. 

We denote by So, Si, £>2, • ■ ■ the random variables 
(RVs) for the sequence of machine states visited and 
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by X , Xi, X 2 , ■ ■ ■ the RVs for the associated sequence 
of output symbols generated. The sequence of states 
(iSi)i>o is a Markov chain with transition kernel T. 
However, the stochastic process we consider is not the 
sequence of states, but rather the associated sequence of 
outputs (Xl)l>o, which is not normally Markov. We as- 
sume that an observer of the process sees the sequence of 
outputs, but does not have direct access to the machine's 
"hidden" internal states. 

Example: Even Process Machine Figure [I] gives an 
HMM for the Even Process, a machine that has been 
studied extensively [5] ■ Its name derives from the feature 
that in its output there are always an even number of Is 
between consecutive Os. The transition matrices are: 
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FIG. 1: A hidden Markov machine (the e-machine) for the 
Even Process. The transitions denote the probability p of 
generating symbol x as p\x. 

The following notation will be used for sequences of 
output RVs: 

1. ~f = X Q X X 

2. ~f L = X Q X 1 ...X L ^ 1 , and 

3. Xf — X t X t +i . . . X t +L-i ■ 

Definition 2. A finite-state e-machine is a finite-state 
edge-label hidden Markov machine with the following 
properties: 

1. Unifilarity: For each state oy. € S and each symbol 
x G A there is at most one outgoing edge from state 
Ok labeled with symbol x. 

2. Probabilistically distinct states: For each pair of 
distinct states o~k,o j € S there exists some finite 
word w = XoXi . . . xl-i such that: 

Pr(^ L = w\S = o- k ) £ Pr(A* L = w\S = aj ) . 

Example (continued) The Even Process machine 
given above is also an e-machine. It is clearly unifilar, 
and (Ti can generate the symbol whereas er 2 cannot, so 
the states are probabilistically distinct. 



Remark. e-Machines were originally defined in Ref. 
f 'Sj as hidden Markov machines whose states, known as 
causal states, were the equivalence classes of infinite pasts 
"ST with the same probability distribution over futures x . 
This "history machine" definition is, in fact, equivalent 
to the "generating machine" definition presented above 
in the finite-state case. Although, this is not immediately 
apparent. Formally, it follows from the synchronization 
results established here and in Ref. ^j. 

We now provide the definitions for two extensions of 
an e-machine M that are necessary for our proofs later 
on: the edge machine M e d ge and the power machine M n . 
In what follows: 

1. Pr(a;|cr fc ) = Pr(Jf = x\S = a k ), 

2. Pv(w\a k ) ee Pr(^H = w\S = a k ), 

3. I{x, k,j) denotes the indicator function of the tran- 
sition from state o~k to state o~j on symbol x, and 

4. I(w, k,j) denotes the indicator function of the tran- 
sition from state o~k to state o~j on the word w. 

That is, I(x,k,j) — 1 if o k A <jj and otherwise; 
I(w, k,j) = 1 if a k —> (Tj and otherwise. 

Definition 3. For an e-machine M , the corresponding 
edge machine M e d ge is the Markov chain whose states 
are the outgoing edges of M . That is, the states are the 
pairs (x,crf.) such that Pi(x\o~k) > 0, and the transition 
probabilities are defined as: 

Pr((x,a k ) -> (y,aj)) =Pr(y\o- j )I(x,k,j) . 

A sequence of M e d ge states visited by the Markov chain 
corresponds to a sequence of edges visited by the original 
machine M. The process V e d ge generated by M edge can 
be thought of as the bi-process (Xl,Sl)l>o generated 
by the original machine M as it moves from state to 
state generating symbols. Note that since M's graph is 
strongly connected, M erf9e 's graph is as well. Hence, the 
edge-label Markov chain is irreducible and has a unique 
stationary distribution ir e d ge - See Fig. IJtop). 

Definition 4. Let M be an e-machine, and let n be rel- 
atively prime to the period p of M's graph. The power 
machine M n is defined to be the e-machine with the states 
of M , output symbols which are length-n words generated 
by M , and transition probabilities given by: 

Pr(cr fc ^> o-j) = Pic(w\o-k)I(w,k,j) . 

The power machine M n generates the same process as 
the original machine M , but over length-n blocks. 

Note that since M is by definition unifilar with proba- 
bilistically distinct states, M n is also necessarily unifilar 
with probabilistically distinct states. Furthermore, it can 
be shown that for n relatively prime to p = period(A/) 
the graph of M n is strongly connected. Therefore, for n 
relatively prime to p, M n is indeed an e-machine for the 
process V n . See Fig. [^bottom). 
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FIG. 2: Examples of M e d ge (top) and M 2 (bottom) for the 
Even Process e-machine M. 

Definition 5. For an e-machine M the minimum dis- 
tinguishing length L* is the shortest length L such that 
the probability distributions over futures X* L of length L 
are distinct for each pair of distinct states o~k and o~j : 

L* = min{L : Pr$ L \S = a k ) ± Pr(^ L |5 = a 3 ) , 
for all j} . 

If a machine M has a minimum distinguishing length L* , 
we also say that M has length-L* future distinguishable 
states. 

Note that L* must be finite for any e-machine, since 
e-machines have probabilistically distinct states, and 
that, for n > L* and relatively prime to p — period (AT), 
M n is an e-machine with a minimum distinguishing 
length of 1. 

B. Synchronization 

Although we assume that our observer is not able to 
directly see the e-machine's internal state (<St), it is able 
to see the output symbols generated by the machine (the 
Xj^'s). Thus, the observer may attempt to infer the in- 
ternal machine state through observations of the output. 
We are interested in studying the procedure by which 
the observer synchronizes to the machine's state through 
these observations. Due to unifilarity, we know that if 
an observer is able to completely synchronize to the ma- 
chine's internal state at some time T > 0, it remains 
synchronized for all future times T' >T . For simplicity, 
we assume that the initial state is chosen according to the 
stationary distribution tt, so that the process generated 
by the machine is stationary, and also that the observer 
has knowledge of this fact. 

For a word w of length L generated by the machine 
let <j>(w) = Pr(«S|M;) be the observer's belief distribution 
as to the current state of the machine after observing w. 
That is, 

^(«j) fc = Pr(S L = a k \]t L = w) 

= Pr(S L = a k \l L =w,S ~ tt) . 



And, define the observer's uncertainty in the machine 
state after observing w as: 

u(w) = H[4>(w)} 

= H[S L \i L =w] . 

Let C(M) denote the set of all finite words that M 
can generate, £l(M) the set of all length-L words it can 
generate, and Coo(M) the set of all infinite sequences 
x = xqXi... that it can generate. 

Definition 6. A word w £ C(M) is a synchronizing 
word (or sync word) for M if u{w) = 0; that is, if the 
observer knows the current state of the machine with cer- 
tainty after observing w. 

We denote the set of M's infinite synchronizing se- 
quences as SYN(M) and the set of M's infinite weakly 
synchronizing sequences as WSYN(M): 

SYN(M) = e C X (M) : u{~^ L ) = for some L}, and 
WSYN(M) = {it e £oo(M) : u{~^ L ) -> as L -> oo} . 

Definition 7. An e-machine M is exactly synchroniz- 
able (or simply exact ) if Pr(SYN (M)) = 1; that is, if the 
observer synchronizes to almost every (a.e.) sequence the 
machine generates in finite time. 

Definition 8. An e-machine M is asymptotically syn- 
chronizable if Pr(WSYN(M)) = 1; that is, if the 
observer's uncertainty in the machine state vanishes 
asymptotically for a.e. sequence the machine generates. 

Examples: 

• The Even Process e-machine is an exact machine. 
Any word containing a is a sync word for this 
machine, and almost every x generated by this 
machine contains at least one 0. 

• The ABC machine (Fig. [3| is not exactly synchro- 
nizable, but it is asymptotically synchronizable. 
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FIG. 3: The Alternating Biased Coin (ABC) machine: The 
process it generates can be thought of as alternately flipping 
two coins of different biases, g. 

We note that any machine with a single state is neces- 
sarily exact since the observer is synchronized before ob- 
serving any output. However, the synchronization ques- 
tion in this case is moot. Thus, when discussing exact or 
nonexact machines, we will always assume N > 2. Also, 
since any finite word w € C(M) is contained in a.e. in- 
finite sequence ~x* an e-machine M generates, we know 
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that a machine M is exact if (and only if) it has some 
sync word w of finite length. 

One final important quantity to monitor during syn- 
chronization is the observer's average uncertainty in the 
machine state after seeing a length-!/ block of output. 

Definition 9. The observer's average state uncertainty 
at time L is: 

U{L)=H[S L \1 L ] 

= Pr(^ L M^ L ) . 

C. Prediction 

A process's intrinsic randomness is measured by its 
entropy rate and that, in turn, determines how well an 
observer can predict its behavior. 

Definition 10. The block entropy H (L) for a stationary 
process V is: 

H(L) = H[1 L ] 

= -J2 Pr(^ L )log 2 Pr(^ L ) . 

Definition 11. The entropy rate is the asymptotic 
average entropy per symbol: 



= lim H{X L \J? L ] . 

Definition 12. Its length-! approximation is: 
h^L) = H{L)-H{L-\) 

= H[x L ^ l \i L - 1 ] . 

That is, h^(L) is the observer's average uncertainty in 
the next symbol to be generated after observing the first 
L — 1 symbols. 

For any stationary process, hu(L) monotonically de- 
creases to the limit 4]. However, the form of con- 
vergence depends on the process. The lower the value 
of hp a process has, the better an observer's predictions 
of the process will be asymptotically. The faster h^L) 
converges to the faster an observer's predictions will 
reach this optimal asymptotic level. Since we are of- 
ten interested in making predictions after only a finite 
sequence of observations, the source's true entropy rate 
as well as the rate of convergence of h^L) to h^, are 
both important properties. 

Now, for an e-machine, an observer's prediction of the 
next output symbol is a direct function of the proba- 
bility distribution over machine states induced by the 
previously observed symbols. That is, 

Pr(A L = x\X~ L = ^ L ) 

= ^Pr(a;|a fe )Pr(5 L = a k \l L = . 

k 



Hence, the better an observer knows the machine state at 
the current time, the better it can predict the next sym- 
bol the machine generates. And, on average, the closer 
U{L) is to 0, the closer h^(L) is to h^. Therefore, the rate 
of convergence of hp,{L) to h^ for an e-machine is closely 
related to the average rate of synchronization. This is 
one of the primary motivations for studying the synchro- 
nization problem. 

III. AN INTUITIVE PICTURE 

fn this section we present an intuitive picture of the 
synchronization process and use it to derive a formula for 
the conditional state distribution 4>(w). The basic idea 
is illustrated schematically in Fig. [I] for a hypothetical 
5-state machine. 
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FIG. 4: Synchronization illustrated for a 5-state machine. 

fnitially, the observer does not know the machine state 
Sq, so all five states {<j\, 0-2,0-3,0-4,0-5} are possible. After 
seeing the first symbol xq, there are only four possibil- 
ities for Si — {oi, (72) 0-3, 05} — since only four of the five 
states may generate this symbol. After seeing the sec- 
ond symbol Xi, a different set of four states is possible — 
{<7i, (T2, CT3, 04}. After seeing the third symbol X2, there 
are only three possibilities 172, 03} for £3, since two 
of the state paths merge on seeing the third symbol. Fi- 
nally, after seeing the fourth symbol x%, two more state 
paths merge and another dies, so there is only one pos- 
sibility {173} for 54. The observer has synchronized. 

The transition function 5(o~k,x) is defined by the re- 
lation <7fc A S(crk,x) and the word transition function 

S(ak,w) by the relation o~k — > 8(crk,w). fn general, 
for each possible initial state and each x L there 
is a state path p k following under x L . That is, 
P k = Pq,Pi,---,Pl> where Po = Pi = S(Po,x ), 
p\ = fi(Pi> x i)> an d so on. An observer synchronizes ex- 
actly once all, but one, of these paths have either merged 
or died. 
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If a machine is not exactly synchronizable, then it is 
impossible for all paths to merge or die and, at any finite 
time L, there are at least two possible nonmerged paths 
remaining. However, it is still possible for an observer 
to synchronize to such a machine asymptotically. To un- 
derstand how this happens we need to know the relative 
probabilities of being in each of the possible remaining 
states at a given time. In general, the probability of 
starting in state o~ k and generating the word is: 



Pr(p* 



Pi(S = a k ,^ L 
^•Pr(^ L K0 



-2 1 



These probabilities will be exactly if and only if the 
path p k dies by the L t h symbol. Typically, however, all 
these probabilities decay — in fact, decay exponentially 
fast — as L — » oo. For nonexact synchronization, we are 
concerned not with absolute path probabilities, but with 
their relative or normalized probabilities. The probabil- 
ity of ending up in state o~j at time L is simply the sum 
of the normalized probabilities of all paths ending up in 
state Oj. That is, for any word w — x L we have: 



Pr(5 L : 
Pr(5 L 



w) 
- w) 



Pt(X* l = w) 
Efc ^fc - Pr(w\a k ) ■ I(w,k,j) 

J2 k Pr(p k )-I(w,k,j) 



(1) 



For nonexact asymptotic synchronization, then, the 
important quantities to consider are the relative prob- 
abilities of all paths that never merge or die. If a nonex- 
act machine is asymptotically synchronizable, then for 
a.e. x there must be some state a k such that the ratio 
of the path probabilities: 

Pr(p fc (^)) 

Pr(pJ"(^ L )) ' 

as L — > oo, for any path p 3 which does not eventually 
merge with p k or die. Since Hk/^j is bounded for all 
states <7fc and aj, the initial state is unimportant for 
asymptotic synchronization. The question is whether, 
on average, the transition probabilities for one path are 
greater than those of the other. If, on average, the tran- 
sition probabilities for path p k are c (> 1) times as likely 
as the transition probabilities for path p> then, for large 



L, Pr(p fc )/Pr(p?) 



Intuitively, this is why synchro- 



nization occurs exponentially fast. Establishing this, as 
we will see, requires some care, however. 



pression has been previously established in similar con- 
texts (see, e.g., Ref. [5]), we provide a derivation as well 
for completeness. The proof presented here is also some- 
what simpler than the original in Ref. [5] . A proof quite 
similar to ours for unifilar Moore hidden Markov models 
(as opposed to the edge-label or Mealy models we use) is 
given in Ref. [5]. 

Proposition 1. For any e-machine M, 



h» = H[X \So] 



(2) 



where h k = H[X \Sq = a k ]- 

Proof. We establish the bounds from above and below 
separately. 

Upper bound: < H[X \So]- We calculate directly: 



lim 

L— f oo 



H^ 1 



< lim 

L— >oc 



L 



L— >oo L 

ffi Um g[So] + £,to 1 g[*ilfr] 

( C ) H[S ] + H[X \So] ■ L 
= lim 

L— too L 

= H[X \S a ] , 

where step (a) follows from the chain rule, step (b) from 
unifilarity, and step (c) from stationarity. 
Lower bound: hn > H[Xq\Sq]. We have: 

H[1 L } 
L 

H[1 L \S ] 



h n — lim 

L —> oo 



> lim 



L 



(a) Um E^o H[X t \S ^} 



L— >oo 



L 



(») lim T.U H[Xi\Si] 

L—toc L 

(J Um H[X \Sq}-L 

L—toc L 

= H[X \S»] , 

where again step (a) follows from the chain rule, step (b) 
from unifilarity, and step (c) from stationarity. □ 



IV. THE ENTROPY RATE FORMULA 



AVERAGED ASYMPTOTIC 
SYNCHRONIZATION 



In this section we derive a formula for the entropy rate 
of a finite-state e-machine. Although an analogous ex- 



The entropy rate formula says that (on average) an 
observer predicts asymptotically just as well as if it knew 
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the machine state exactly: 

h„ = lim H[X L \i L ] 

L — yoo 

= H[X \S ] . 

Intuitively, this suggests that an observer's average un- 
certainty U(L) in the machine state must vanish asymp- 
totically. That is, we should have: 

lim U{L) = . 

L— >oo 

These ideas are made rigorous below with a convexity 
argument. 

The following notation will be used: 

• Let p k = Pr(X |iS = Cfc) and p w = Pr(X |5 ~ 
<f>(w)). 

• Let hk = H[pk] (as above), h w = H\p w ], and h w = 

• Let A tiL = {w e C L (M) : u(w) < e} and A c e L = 
£l(M)/A,: 7 l, the complement of A t ^. 

We note that for any word w: 

Pw = ^2pk<t>{w)k , (3) 

k 

and, hence, by the concavity of the entropy function H[-\. 
h w = H[p w ] 

= H ^p k (j){w)i 

. k 

k 

k 

Also, for any length L: 

h^L + 1) = H[X L \1 L ] 

= ^2 Pr(w)h w , 



(4) 



(5) 



and 



w£C L (M) 

k 

= E( E PrM<l>Wk)h k 

k \w€C L (M) J 

= 2J Pr(w)y^0(w) fc fefc 

weC L (M) k 

= Pr(w)h w . 

weC L (M) 



Proposition 2. For any finite-state e-machine M: 



lim U{L) = 

L— ¥oo 



(7) 



Proof. We first prove the statement under the assump- 
tion that M has a minimum distinguishing length L* = 1. 
We then use this result to establish the general case. 

Case (i): M has minimum distinguishing length L* = 
1. The proof is by contradiction. If U(L) -fo 0, then 
there must be some e > for which Pr(AJ L ) -/> 0. Hence, 
there exists some S > and a subsequence (Li)°^ 1 of the 
Ls such that Pt(A^ l .) > <5, for all i. 

Let A be the unit simplex in M. N : 

A = 1 e R N : 4>k = 1 and <j) k > 0, for all fcj , 
and let: 

A e = {<f> e A : H[<f>] > e} . 
Define / : A e R by: 



m = h 



^2<t>kPk 



J2^H[Pk] 



so that, for any word w, f(<p(w)) — h w — h w . 

Then, with respect to || • ||i, f(<j>) is a continuous func- 
tion on A e and A e is a compact set. Therefore, we know 
/ obtains its minimum /* at some point <f>* € A e . 

Since M has a minimum distinguishing length L* = 1 
and the entropy function H[-\ is strictly concave, we know 
f(4>) > for all <f> £ A £ . In particular, /(>*) = /* > 0. 

Hence, for each i we have: 



hfi(Li + 1) - 

we£ Li (M) 

= 22 Pr(w) • {h w - h w ) 

> 2^ Pr(iu) ■ (h w - h w ) 

weA c e L 

>Pv(Al Li ).f* 
>6f* ■ 



J2 

wec Li (M) 



(8) 



(6) 



Step (a) follows from Eqs. ^ and (|6| and step 
(b) follows from Eq. Q. Equation ^ implies that 
h^L) t^> hp, which is a contradiction. Hence, we know 
Iim i _ >oo W(L) = 0. 

Case (ii): M has minimum distinguishing length L* > 
1. Take n > L* and relatively prime to the period p of 
M's graph, so that M n is an e-machine with a minimum 
distinguishing length of 1. Let Yj, be the RV for the L t h 
output symbol generated by the machine M n , let Rl be 
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the RV for M n, s L th state, and let V(L) = H[R L \f L }. 
Note that for any L: 

V{L)=U{nL). (9) 

Now, for a contradiction assume \vmL^,oo tl{L) =f 0. 
Then, since U(L) is monotonically decreasing, we know 
there exists some e > such that U{L) > e, for all L. 
Thus, by Eq. we know that V(L) > e for all L as 
well, so V(L) -fr 0. However, since M n has a minimum 
distinguishing length of 1, by case (i) above we know 
that V(L) must go to zero. This contradiction implies 
that lim L ^ oc W(L) = 0. □ 

VI. THE NONEXACT MACHINE 
SYNCHRONIZATION THEOREM 

In this section we prove our primary result, the Nonex- 
act Machine Synchronization Theorem. This extends the 
weak asymptotic synchronization result of Sec. |V]to show 
that synchronization occurs exponentially fast for nonex- 
act machines. The statement is quite analogous to the 
Exact Machine Synchronization Theorem given in Ref. 
PP. Essentially, it says that, except on a set of words x L 
of exponentially small probability, an observer's uncer- 
tainty after observing x L is exponentially small. 

The following notation will be used. $l = <f'C^ L ) 
is the random variable for the belief distribution over- 
states induced by the first length-L word the machine 
generates, and Sl is the most likely state in $^ (if a tie 
the lowest numbered state is taken). Pl = Pr(6>i) is the 
probability of the most likely state in the distribution $l, 
and Ql = Pr(NOT Sl) is the combined probability of all 
other states in the distribution For example, if <£>l = 
(0.2,0.7,0.1), then S L = a 2 , P L = 0.7, and Q L = 0.3. 
Realizations are denoted 4>l, sl, Pl, and qt, respectively. 
We also define U L = H[$ L ] and u L = H[(/> L ]. 

Theorem 1. (Nonexact Machine Synchronization The- 
orem) For any nonexact e-machine M , 

1. There exist constants K\ > and < ct\ < 1 such 
that: 

Pr(Q L > a[) < K ia [ , for all L G N. 

2. There exist constants K 2 > and < a 2 < 1 such 
that: 

Pr(U L > 0% ) < K 2<*2 , for all LeN. 

The proof strategy is as follows. We first take a power 
machine M n of the machine M with U(n) = e <C 1, 
and prove the theorem for the power machine. We then 
use the exponential convergence of the power machine to 
establish the theorem in general with a subsequence-type 
argument. 

The following lemma on large deviations of Markov 
chains will be critical. 



Lemma 1. Let Zq,Z\,... be a finite-state, irreducible 
Markov chain, with state set R = {r\, ...,r n } and equi- 
librium distribution p = (pi, p n ). Let F : R — > R, 
Y L = F(Z L ), and Y L = ±(Y + ... + Y L -i). Define 
Pf = Ep(F) = J2k PkF( r k)- Then, for any e > 0, there 
exist constants K > and < a < 1 such that, for any 
state TV 

Pr (\Y L - p F \ > e\S = r k ) < Ka L , for allLeN . 

Proof. A similar statement (with more explicit values of 
the constants) is given in Ref. [7] for a general class of 
Markov chains, which includes all finite-state, irreducible, 
aperiodic chains. The result stated here follows directly 
for finite-state, irreducible, aperiodic chains, and can be 
extended to the periodic case by considering length p- 
blocks, where p is the chain's period. □ 

Remark. Note that since the deviation bound holds con- 
ditionally on any initial state r^, it also holds condition- 
ally on any distribution over the initial state by linearity. 
In particular, we apply this lemma assuming Zq ~ p. 

Let us denote: 

Pr(x,a k ) = Pr(5 = <7 kl X = x) , 
Pr(x\a k ) = Pr(X = x\S = a k ) , 
Pr(a k \x) = Pr(S = a k \X Q = x) , and 
c max ,x = argmax Pi(a k \x) , 

where again the lowest numbered state is chosen in the 
case of a tie for cr maX;2: . Also, for any x and aj with 
Pr(x\aj) > 0, let us define: 

S x ,j = {o- k e S : Pr(x\a k ) > , 8(a k ,x) ^ 8(a j ,x)} , 
g(x,o~j) = max Pr(x\a k ) , and 

f(x,aj)= max Pr(a k \x) . 

Note that g(x,<jj) and f(x,o~j) are both always strictly 
positive for nonexact e-machines. And, also, that for any 
joint length-L realization ,~sf L ): 

Iff > Pr(sp) L yf Prjx j\si) 

q L ~ Pr(NOT s ) f} Q g(x u s t ) ' 1 ' 

by Eq. ([!]). Here, Pr(so) = w k is the stationary proba- 
bility of the state sq = a k and Pr(NOT so) = 1 — Pr(so). 

Using Lemma [l] we now prove our desired theorem un- 
der the (relatively strong) assumption that: 

*-M3gg)}>-- <-» 

This assumption will later be satisfied for some power 
machine M n . 



Lemma 2. Let M be a nonexact e-machine satisfying where B 2 = 2^ > and ?/ 2 = 2 C / 2 > 1. Thus: 



Eq. (11). Then 



1. There exist constants Ki > and < ot\ < 1 such 
that: 

Pr(Q L > af ) < K x ol\ , for all LeN. 

2. There exist constants K 2 > and < a2 < 1 such 
that: 

Pr(U L > a 2 ) < F 2 a 2 . /o r a ^ ieN. 



Proof. We first prove Claim 1 and then use this to show 
Claim 2. 

Proof of Claim 1 : Consider the edge-label Markov pro- 
cess V e dge generated by the edge machine M e d ge - Let 
Zl = (Xl 7 Sl) denote the RV for the L t h M e d ge state 
and let: 



Y L = F{Z L ) 



= log; 



( ¥y{X l \S l ) 

\g(X L ,S L ) 



We assume, of course, that (Xq,So) ~ ^edge or, equiva- 
lently, So ~ 7r. 

By hypothesis, /xp = E lretise (F) = C > 0. Take e = 
C/2. By Lemma [l] there exist constants B\ > and 
< 771 < 1 such that: 

Pr(|F L - p F \ > e) < B^f , for all L G N . 

Thus, for any L: 



Pr 



,i=0 



/ Pr(^|50 

\g(x it Si) 



< =Pr(r L <C/2) 



= Pr(F L < itp — e) 
<Pr(|F L - MF | >e) 

< Si»7f • 

Now, let 17* L = (~x* L ,~i? L ) be any typical sequence, i.e.: 

L-l 



Taking logarithms of Eq. ( 10 ) we find: 
where f3 = min fc log 2 (pjjfp^y)- Or, equivalcntly: 



9£ 



> 2^ • 2§ L = B 2 r72 



QL < — < -83773 , 
PL 

where B 3 = 1/B 2 > and 773 = I/772 < 1. Since this 
holds for any typical sequence (~£ L 1 1> L ) we have, for 
each L: 

Pr(Q L > S 3 773 L ) < Bmt 

And, therefore, for any 1 > ot\ > max{?7i, 773} there exists 
some K\ = Kx(a.\) sufficiently large that: 

Pr(Q L > of) < Kia[ , for aU L G N . 

Proof of Claim 2: By Claim 1 we know there exist 
constants K\ > and < ot\ < 1 such that: 

Pr(Q L > af) < K x a{ , for all L G N . 

Let us define: 

V+ = {^ L : a L > } and V£ = {^ L : q L < af} . 

Take Li sufficiently large that 1 — a\ > 1/2, for all 
L > L\, Note that the first-order Taylor expansion 
about x = 1 of log 2 (l — a[) is — log 2 (e)af' + 0{a\ L ) rj 
-1.44ai + 0(a 2i ). Thus, there exists some L 2 G N 
such that |log 2 (l - af)| < 2af for all L > L 2 . Take 
L = max{Li, L 2 }- 

Then, for any L > L and any le^ G V7~, we have: 



H[S L \^ L ] 



(a) 

< H 



l-ai 



of 



N-r N- 1 



N - 1 



(l-<)log 2 (l-<) + <log 2 

= -(l-af)log 2 (l- Q! f) 

-afLlo&teO + af log 3 (JV-l) 

(6) 

< (1 - af)2af - a[ilog 2 (ai) + af log 2 (iV - 1) 

< LCiaf 

< C 2 a L , (12) 

where Ci = 2 - log 2 (ai) + log 2 (iV - 1) > 0, step (a) 
follows from the fact that 1 - af > 1/2 for L > L%, 
and step (b) follows from the Taylor expansion bound 
on |log 2 (l — a\ )| for L > L 2 . In the last line, a may 
be chosen as any real number in the interval (ax, 1) and 
C-2 = C 2 (o!) is chosen sufficiently large to ensure the last 
inequality holds for all L > Lq. 

Equation (fl2| implies that, for all L > Lq: 



Pr(U L < C 2 a L ) > Pr(V£) > 1 - K x a\ 
So, we know that, for all L > Lq: 

Pt(U l > C 2 a L ) < K x a\ < K x a L . 
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Therefore, for any a 2 G (a, 1) and L sufficiently large: 

Pr{U L > of-) < K x c% . 
And, hence, there exists some K2 > -Ki such that: 
Pr(E/ L > 0%) < K 2 a% , for aU L € N . 

□ 

To establish the theorem in general now, we show that 
for any machine M there exists a power machine M n 
satisfying Eq. (111. To do so requires several additional 
lemmas. 

Lemma 3. Let M be a nonexact e-machine. Then, for 
all x and o~j with Pr(x\(Tj) > : 



g(x,<Jj) < f(x,<?j) 



Pr(aO Aa 



where A = maxjj Tti/iTj and Pr(<jj) and Pr(x) are the re- 
spective stationary probabilities of the state o~j and symbol 
x: Pr(crj) = TTj and Pr(x) = Pr(Ao = x\Sq ~ 7r). 

Proof. Fix x and <7,-. Take a kl G 5 x ,j such that: 

Pr(a kl \x)/Pr(cr kl ) = max Pr((T fc |x)/Pr(cr fc ) , 

and take <7& 2 6 S x ,j such that: 





Pr(a k2 \x 


) = 


= max Pv{a k \x) . 


Then: 


Pr(o-fe 2 |x) 


> 


Pr(u fel |a;) 






PrKJ 












Pr((T fel |a;) 


Pr(<7fcJ 








Pr(^i) 


PrKJ 






> 


Pr(cr fcl |x) 


1/A , 






PrKJ 


and 


Pr(cr fc2 |x) 




Pr(a k2 \x) 


Pr(^) 




Prfo) 




Pr(^ 2 ) 








> 


Pr(<Tfe 2 |a;) 
Pr(a fc2 ) 


1/A. 



Combining these relations we see that: 
PrQfeJaQ < A 2 p rQfc 2 |z) 

And, therefore: 

g(x,a J )= max {Pr(x|cr fe )} 



= max ^Pr(a fc |x).^| 



< A 



max 

max {Pr(er fc |x)/Pr(er fc )} • Pr(x) 

PrKijx) 
Pr(o-fci) 
2 Pr(cr fc2 |^) 



Prfo) 



Pr(x) 

Pr(x) 



f(x,cr 3 ) 



Pi(x) a2 



Pr(^) 



Define A e = _4 £ ,i = {a; G .4 : u(at) < e}. 
Lemma 4. Lei M be a nonexact e-machine such that: 

1. Pr(A e ) > 1 — e, for some e < 1/2 , and 

2. Pr(a m ^ x \x)/ (f(x,a m ^ x )\ 2 ) > 2 2N ' x2 , for all 
x G A e . 

Then, Effed9e {log 2 (5gg)}>0. 
Proof. Applying Lemma [3] we see: 



r f PiiXo\So) 

l log2 l^o^y 

> VVPrfc o-OIok f Prfol*)Pr(*)/Prfon 



^Pr(x)^Pr(a J »log 2 



Pr(^» 

f(x,<Tj)X< 

S3 Pr(») hfe f *<°ffi a ) (18) 



f{x,a 3 )X 2 



Now, for any x € we have: 



> 



> 



E a2 



log 2 



>E a2 -m) 



A 2 " &2 V A 2 
/Pr( CTj » 



□ 



-NX 



where H(-) is the binary entropy function. So: 



> Pr(AJ) • —NX 2 

> -eNX 2 . 



(14) 
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Also, if we let S = <S/{cr maXiX }, then for any x € A t : 

E„ , . . / PrfaAx) 



= Pr(cr maXia; |a:)log 2 



E Pl i a i\ x ) lo & 



Pr(tTmax, a k) 
f{X) 0"max,ic)'^ 

Pr(^|x) 



cr.,es- 

> — 2A 2 A 2 
~ TV 



2 V/(^^)A 2 
Pr(<x,»log 2 



> — 2A 2 A 2 
~ AT 



> 2AA 

> 2AA 2 
= A A 2 . 

And, hence: 



E * 
E a 2 -- 

<jjes- 
NX 2 



f(x,a j )X 2 
'r 

V ~A 2 



log2 

Prfolz) 



A 2 



5] Pr(x)^ Prolog, 



> Pr(A e ) ■ NX 2 

> (l-e)AA 2 . 



(15) 



Combining Eqs. (131, (14), and (15), we see that 



E log 2 



Pr(A |So) 
g(X ,S ) 



> (1 - 2e)AA 2 = C 



where C > for e < 1/2. Since M is not exactly syn- 
chronizable, we know g(x,o~j) is always strictly positive, 
so this expectation must be finite. Hence, there exists 
some real number C > C > such that: 

□ 

Remark. In the above proof we implicitly assumed 
Pv(x,aj) ^ for all x and j. The sums for the ex- 
pectation are, of course, computed only over those x and 
j for which Pr(x, o~j) =/= 0. Terms involving pairs (x,aj) 
with Pr(.T, Uj) =0 should be omitted. 

Lemma 5. For any nonexact e-machine M , there ex- 
ists some n € N such that the power machine M n is an 
e-machine with 



E log 2 



Pr(Y \R ) 
9(Y Q ,Ra) 



> 



where Yj, is the RV for the the L t h output symbol gener- 
ated by the machine M n and Rl is the RV for the L t ^ 
M n -state. 



We also denote the alphabet of M n as B and the set 
A e for the machine M n as B c \ i.e., B c ~ A c , n for M. 
We define <r(</>) to be the most likely state in a distri- 
bution (j> over the machine states, and Pr(cf(<^)) to be 
the probability of this state in the distribution <f>. For 
example, if <f> = (0.3,0.1,0.2,0.4), then W((f>) = 0-4 and 
Pr(ff(0)) = 0.4. 

Proof. Given any nonexact e-machine M, 

1. Takee = 1/(aA 2 2 2Jv2a2 ). 

2. Take e small enough that Pt (a (</>)) > 1 — e for any 
state distribution </> with H[<p] < e. (Without loss 
of generality, we may assume e < 1/2.) 

3. For e as above, take n relatively prime to the pe- 
riod p of M's graph and large enough such that 
Pr(A eyn ) > 1 — e. (Note that this is possible 
since \imi / ^ OD Pi(A e j J ) = 1, for all e > 0, since 
Iimi-»ooW(i) = 0.) ' 

Then, M™ is an e-machine for the process V n and 



Pr(B e ) 
we have 



Pr(A e n ) > 1 — e. Moreover, for all y £ B e 
H[ ( p(y)]<eMPr(a(4>(y)))>l-e 

_^ Pr(CTmax^|y) 1/A 



f\Ui ^max,y) 
(b) Pr(q-max,j/|;/) 
f(y,<7max,y) 
Pr(g m ax,y|y) 
/(y, cr maXj j / )A 2 



> A 2 2 27V2A2 



> 2 



2N \ 



where step (a) follows from item 2 above and step (b) 
follows from our choice of e. Hence, by Lemma |4j 



E { log 2 



Pr(jUgo) 
9(Y , Ro) 



= C > . 



(Note that A for M is the same as A for M", since M 
and M n have the same stationary distribution it.) □ 

Finally, in order to convert between Ul and Ql con- 
vergence in our theorem we need one last lemma. 

Lemma 6. For any Ql- 

1- %Ql < 1/2, then Ul > Ql- 

2. If Q L > 1/2, then U L > H{l/N), where H{-) is 
the binary entropy function. 

Proof. Note that: 

U L = H[<S> L ] 

>H[(1-Q L ,Q L ,0,...,0)} 
= H{Q L ) . 
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Since H(Ql) > Q L log 2 (1/Q L ), we know H(Q L ) > Q L 
for Ql < 1/2. Since H(Ql) is monotonically decreasing 
on [i, 1 — 1/N] and Ql is at most 1 — 1/N, we know 
H(Q L ) > H(l - 1/N) = H{l/N) for Q L > 1/2. □ 

Using these lemmas we can now prove the primary 
result of this section. 

Proof. (Nonexact Machine Synchronization Theorem) 
We first prove Claim 2 of the theorem and then use this 
to show Claim 1. 

Proof of Claim 2: Given any nonexact e-machine M, 
take a power machine M™ as in Lemma [5] such that: 



> o . 



Denote the random variable for the machine M n as 
Vl and the quantity U(L) for the machine M n as V(L). 
By Lemma [2] we know there exist constants B\ > and 
< i]i < 1 such that: 

Pr(V L > rfc ) < B x r\\ , for all LeN. 

A proof identical to that of Prop. [3] below then shows 
there exists some B 2 > B\ such that: 

V(L) < B 2 Tii, for all L G N. 

Or, equivalcntly: 

U{nL) < B 2 rfc , for all L G N. 

Taking rj 2 = f}\ we have: 

U{m) < B 2 rg , 

for any length m that is an integer multiple of n. Since 
U(m) < log 2 (iV) for any m and is monotonically decreas- 
ing, it follows that: 

U(m) < Krj r 2 n , for all m e N, 

where K = max{B 2 ,\og 2 (N)} /r) 2 . Thus, by Markov's 
inequality, we know that for any to g N and t > 0: 

Pr( f / m > i )< 1E ^ = «<^. 



Taking t — rj 2 1 ^ 2 yields: 



Pr(t/ m > a m ) < Kd 



where a = r^ 2 . 

Proof of Claim 1 : By Claim 2 we know there exist 
constants K > and < a < 1 such that: 



Take Lq large enough that a L " <H(1/N). Then, for all 
L > Lq we have: 

Pr(Q L > a L ) = Pr{a L <Q L < 1/2) + Pr(Q L > 1/2) 

(*) 

< Pr(C/ L > a L , Q L < 1/2) + Pr(U L > HQ./N)) 

< Pv(U L > a L ) + Pr(U L > a L ) 

< 2Ka L , 

where step (*) follows from Lemma [6j Hence, for some 
K > 2K we have: 



Pr(Q L > a L ) < Ka L , for all L € N 



VII. CONSEQUENCES 



□ 



As a direct consequence of Thm. [l] we establish expo- 
nential convergence results for U(L) and ha(L) analogous 
to those in the exact case [T] . We also use Thm. fl] to 
prove the existence of pointwise almost everywhere (ale.) 
exponential synchronization for nonexact machines. This 
establishes that any e-machine is indeed asymptotically 
synchronizable in the pointwise sense of Def. [HJ 

A. Exponential Convergence ofU(L) 

Proposition 3. For any nonexact e-machine M there 
exist constants K > and < a < 1 such that 

U{L) < Ka L , for all L e N . 

Proof. Let M be any nonexact e-machine. Then by Thm. 
[T] there exist constants C > and < a < 1 such that 
Pr(U L > a L ) < Ca L , for all L € N. Define: 

i t = {i«e C L (M) : u(w) < a L } and 
A C L =£ L (M)/A L 

Then, 

wec L (M) 

= Pr(w)u(w) + Pr(w)u(w) 

i«eA L w£A c L 

<Pv{A L ) ■ a L + Pv{A c L ) -\og 2 {N) 
< 1- a L + Ca L -log 2 (A) 
= ^a 1 , 

where If = 1 + Clog 2 (A^). 



□ 



B. Exponential Convergence of h^^L) 

Proposition 4. For any nonexact e-machine M , there 
exist constants K > and < a < 1 suc/i £/iarf: 



Pr(C/ L > « L ) < i^a L , for all L G N. 



/i M (L) - ^ < J ftTa i , /or all L e N . 
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Proof. This follows directly from Prop. [3] and Lemma [7] 
below. □ 
Lemma 7. For any e-machine M and any L e N; 

h„(L + 1) - hp < U{L) . (16) 
Proof. Note that: 

H[1 L ,X L ,S L ] = H[i L ] + H[S L \t L ] + H[X L \t L ,S L ] 
= H[i L ] + H[S L \t L ] + H[X L \S L ] 
= H[i L ]+H[S L \t L ] + h^ (17) 

and also that: 

H \t L , X L ,S L ] = H[i L ] + H[X L \t L ] 

+ H[S L \t L ,X L ]. (18) 

Equating the RHS of Eqs. §H\ and Q gives: 

H[S L \1 L ] +h^ = H[X L \1 L ] + H[S L \X* L ,X L ] 

>H[X L \t L ]. (19) 

Or, in other words: 

+ > h„(L + l) . (20) 

□ 

Remark. If we define the synchronization and predica- 
tion decay constants, respectively, as: 

a s = limsup U{L) 1/L 

L— >oo 

a p = limsup (h^(L) - h p ) 1/L , 

L— >-oo 

then LemmaVh also implies that a p < a s . This is to say, 
the observer s predictions approach their optimal level at 
least as fast as the observer synchronizes. Since Lemma 
[?] applies to any e-machine, this statement also holds for 
any e-machine (exact or nonexact). 



C. Pointwise a.e. Asymptotic Synchronization 

Proposition 5. For any nonexact e-machine M there 
exists some < a < 1 such that for a.e. x G C aD (M), 
there exists Lq £ N such that for all L > Lq , 

u{~^ L ) < a L . 

Proof. Apply the Borel-Cantelli Lemma to Thm. [T] □ 
VIII. CONCLUSION 

We analyzed the process of asymptotic synchronization 
to nonexact e-machines. Although the treatment is more 
involved mathematically, the primary results are essen- 
tially the same as those for the exact case given in Ref. 
PQ. An observer's average state uncertainty U(L) van- 
ishes exponentially fast and, consequently, an observer's 
average uncertainty in predictions (L) converges to the 
machine's entropy rate h^ exponentially fast, as well. 

We hope to extend the asymptotic synchronization re- 
sults to more general model classes such as countable- 
state e-machines or nonunifilar HMMs. We also intend 
to improve the bounds on the constant a given in the 
convergence theorems. 
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